[1] 0.05399097
Likelihood / Regression
The others side of the coin
Statistics: Interested in estimating population-level characteristics; i.e., the parameters
\[ \begin{align*} y \rightarrow& f(y|\boldsymbol{\theta}) \\ \end{align*} \]
Estimation
- likelihood
- Bayesian
Likelihood
Given a statistical model, all the evidence/information in a sample (\(\textbf{y}\), i.e., data) relevant to model parameters (\(\theta\)) is contained in the likelihood function.
. . .
The information an observable random variable (\(\textbf{y}\)) has about an unknown parameter \(\theta\) upon which the probability of \(\textbf{y}\) \((f(\textbf{y};\theta)\) depends.
. . .
. . .
To learn about \(\theta\) from \(\textbf{y}\), we need to link them together via a special function, \(f(\textbf{y};\theta)\)
## Statistical Information
The pieces:
::: incremental - The sample data, \(\textbf{y}\)
-
A probability function for \(\textbf{y}\):
\(f(\textbf{y};\theta)\)
\([\textbf{y}|\theta]\)
-
The unknown parameter: \(\theta\)
- specified in the probability function
:::
The Likelihood Function
What we can say about our parameters using this function?
\[ \begin{align*} \mathcal{L}(\boldsymbol{\theta}|y) = P(y|\boldsymbol{\theta}) = f(y|\boldsymbol{\theta}) \end{align*} \]
. . .
The likelihood (\(\mathcal{L}\)) of the unknown parameters, given our data, can be calculated using our probability function.
. . .
CODE:
. . .
If we knew the mean is truly 8, it would also be the probability density of the observation y = 10.
Many Parameter Guesses
. . .

Maximum Likelihoof Properties
MLEs are consistent. As sample size increases, they will converge to the true parameter value.
MLEs are asymptotically unbiased. The \(E[\hat{\theta}]\) converges to \(\theta\) as the sample size gets larger.
No guarantee that MLE is unbiased as small sample size. Can be tested!
MLEs will have the minimum variance among all estimators, as the sample size gets larger.
Statistics and PDF Example
What is the mean height of King Penguins?

Statistics and PDF Example
We go and collect data,
\(\boldsymbol{y} = \begin{matrix} [4.34 & 3.53 & 3.75] \end{matrix}\)
. . .
Let’s decide to use the Normal Distribution as our PDF.
. . .
\[ \begin{align*} f(y_1 = 4.34|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{1}-\mu}{\sigma})^2} \\ \end{align*} \]
. . .
AND
\[ \begin{align*} f(y_2 = 3.53|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{2}-\mu}{\sigma})^2} \\ \end{align*} \] . . .
AND
\[ \begin{align*} f(y_3 = 3.75|\mu,\sigma) &= \frac{1}{\sigma\sqrt(2\pi)}e^{-\frac{1}{2}(\frac{y_{3}-\mu}{\sigma})^2} \\ \end{align*} \]
. . .
Or simply,
\[ \textbf{y} \stackrel{iid}{\sim} \text{Normal}(\mu, \sigma) \] . . .
\(iid\) = independent and identically distributed
. . .
Continued
The joint probability of our data with shared parameters \(\mu\) and \(\sigma\),
\[ \begin{align*} & P(Y_{1} = y_1,Y_{2} = y_2, Y_{3} = y_3 | \mu, \sigma) \\ &= \mathcal{L}(\mu, \sigma|\textbf{y}) \end{align*} \]
. . .
IF each \(y_{i}\) is independent, the joint probability of our data are simply the multiplication of all three probability densities,
\[ \begin{align*} =& f(y_{1}|\mu, \sigma)\times f(y_{2}|\mu, \sigma)\times f(y_{3}|\mu, \sigma) \end{align*} \]
We can do this because we are assuming knowing one value (\(y_1\)) does not tell us any new information about another value \(y_2\).
. . .
\[ \begin{align*} =& \prod_{i=1}^{3} f(y_{i}|\mu, \sigma) \\ =& \mathcal{L}(\mu, \sigma|y_{1},y_{2},y_{3}) \end{align*} \]
Code
Translate the math to code…
# penguin height data
y=c(4.34, 3.53, 3.75)
#Joint likelihood of mu=3, sigma =1, given our data
prod(dnorm(y,mean=3,sd=1))[1] 0.01696987
. . .
Calcualte likelihood of many guesses of \(\mu\) and \(\sigma\) simultaneously,
# The Guesses
mu=seq(0,6,0.05)
sigma=seq(0.01,2,0.05)
try=expand.grid(mu,sigma)
colnames(try)=c("mu","sigma")
# function
fun=function(a,b){
prod(dnorm(y,mean=a,sd=b))
}
# mapply the function with the inputs
likelihood=mapply(a=try$mu,b=try$sigma, FUN=fun)
# maximum likelihood of parameters
try[which.max(likelihood),] mu sigma
925 3.85 0.36
. . .
Likelihood plot (3D)
Sample Size
What happens to the likelihood if we increase the sample size to N=100?
. . .